Take Home_Ex03

Author

Dabbie Neo

Published

June 3, 2023

Modified

June 18, 2023

1. Background

FishEye International, a non-profit focused on countering illegal, unreported, and unregulated (IUU) fishing, has been given access to an international finance corporation’s database on fishing related companies. In the past, FishEye has determined that companies with anomalous structures are far more likely to be involved in IUU (or other “fishy” business). FishEye has transformed the database into a knowledge graph. It includes information about companies, owners, workers, and financial status. FishEye is aiming to use this graph to identify anomalies that could indicate a company is involved in IUU.

With reference to Mini-Challenge 3 of VAST Challenge 2023 and by using appropriate static and interactive statistical graphics methods, we will be helping FishEye to better understand fishing business anomalies.

2. Data Source

The data is taken from the Mini-Challenge 3 of VAST Challenge 2023.

3. Data Preparation

3.1 Install and launching R packages

The code chunk below uses p_load() of pacman package to check if packages are installed in the computer. If they are, then they will be launched into R. The R packages installed are:

pacman::p_load(jsonlite, tidygraph, ggraph, 
               visNetwork, graphlayouts, ggforce, 
               skimr, tidytext, tidyverse, patchwork, ggiraph, ggrepel)

3.2 Loading the Data

fromJSON() of jsonlite package is used to import MC3.json into R environment.

mc3_data <- fromJSON("data/MC3.json")

The output is called mc3_data. It is a large list R object.

3.3 Extracting edges

The code chunk below will be used to extract the links data.frame of mc3_data and save it as a tibble data.frame called mc3_edges.

mc3_edges <- as_tibble(mc3_data$links) %>% 
  distinct() %>%
  mutate(source = as.character(source),
         target = as.character(target),
         type = as.character(type)) %>%
  group_by(source, target, type) %>%
    summarise(weights = n()) %>%
  filter(source!=target) %>%
  ungroup()

3.4 Extracting nodes

The code chunk below will be used to extract the nodes data.frame of mc3_data and save it as a tibble data.frame called mc3_nodes.

mc3_nodes <- as_tibble(mc3_data$nodes) %>%
  mutate(country = as.character(country),
         id = as.character(id),
         product_services = as.character(product_services),
         revenue_omu = as.numeric(as.character(revenue_omu)),
         type = as.character(type)) %>%
  select(id, country, type, revenue_omu, product_services) #select() used to organise the sequence of col

4. Data Exploration and Data Wrangling

4.1 Exploring the edges data frame

In the code chunk below, skim() of skimr package is used to display the summary statistics of mc3_edges tibble data frame.

skim(mc3_edges)
Data summary
Name mc3_edges
Number of rows 24036
Number of columns 4
_______________________
Column type frequency:
character 3
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
source 0 1 6 700 0 12856 0
target 0 1 6 28 0 21265 0
type 0 1 16 16 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
weights 0 1 1 0 1 1 1 1 1 ▁▁▇▁▁

The report above reveals that there is no missing values in all fields.

In the code chunk below, datatable() of DT package is used to display mc3_edges tibble data frame as an interactive table on the html document.

DT::datatable(mc3_edges)
Note

The edge table provides us an understanding of the relationship between the source and targets. Here source is the Company and the relationship with the target is based on the type column. There are two kinds of relationship; beneficial owner and company contacts.

Plotting the variables in edge dataframe

Below is the code chunk using ggplot to plot the distribution of the following:

  • Distribution of the type of relationship that exist between the source and target and their corresponding frequency.
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

# Plot distribution of type 
hist_type <- ggplot(data = mc3_edges,
       aes(x = type)) +
  geom_bar() +
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.1) +
  labs(title = "Distribution of Relationship Types", x = "Type", y = "Count") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold"))
  • Number of companies that a beneficial owner owns
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

#Filter the type == "Beneficial Owner" 
mc3_edges_owner <- mc3_edges %>%
  filter(type == "Beneficial Owner") %>% 
  group_by(target, type) %>%
    summarise(no_of_companies = n()) %>%
  ungroup()

# Create a ggplot histogram to plot the no of companies a beneficial owner owns
gg_hist_own <- ggplot(mc3_edges_owner, aes(x = no_of_companies)) +
  geom_histogram(fill = "steelblue") +
  labs(title = "No of companies beneficial owners own", x = "No of companies", y = "Count") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold")) +
  scale_x_continuous(breaks = seq(min(mc3_edges_owner$no_of_companies), max(mc3_edges_owner$no_of_companies), by = 1))

# Calculate frequency counts for each bin
freq_counts <- table(mc3_edges_owner$no_of_companies)

# Create a data frame for labels
label_data <- data.frame(x = as.numeric(names(freq_counts)), y = as.numeric(freq_counts))

# Add frequency labels to the plot
gg_hist_own <- gg_hist_own +
  geom_text(
    data = label_data,
    aes(x = x, y = y, label = y),
    vjust = -0.5,
    size = 3
  )
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

#Combining the two plots using patchwork
combined_plot <- hist_type / gg_hist_own
combined_plot

As seen from the above plot, there are a total of 16,792 count for beneficial owners and 7,244 for Company contacts.

Also, we can see that a majority of owners own 1 company. In fact, less than 0.5% of the beneficial owners own more than 3 companies. This may call for suspicion and we will further investigate later on. For instance, we could look at the size of the company in terms of their revenues and the number of owners it has.

Creating new edge dataframe

Below is the code chunk to create a new edge dataframe called mc3_edges_with_no_of_companies, which has the no_of_companies column added in.

Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

# Join the no_of_companies column from mc3_edges_owner into mc3_edges
mc3_edges_with_no_of_companies <- mc3_edges %>%
  left_join(mc3_edges_owner %>% select(target, no_of_companies),
            by = c("target" = "target")) %>%
  mutate(no_of_companies = ifelse(is.na(no_of_companies), 0, no_of_companies))

# View the updated mc3_edges
mc3_edges_with_no_of_companies
# A tibble: 24,036 × 5
   source                      target             type   weights no_of_companies
   <chr>                       <chr>              <chr>    <int>           <dbl>
 1 1 AS Marine sanctuary       Christina Taylor   Compa…       1               1
 2 1 AS Marine sanctuary       Debbie Sanders     Benef…       1               1
 3 1 Ltd. Liability Co Cargo   Angela Smith       Benef…       1               1
 4 1 S.A. de C.V.              Catherine Cox      Compa…       1               0
 5 1 and Sagl Forwading        Angela Mendoza     Compa…       1               0
 6 1 and Sagl Forwading        Christopher Watson Benef…       1               1
 7 2 Limited Liability Company Amanda Mcdonald    Benef…       1               1
 8 2 Limited Liability Company Megan Padilla      Compa…       1               0
 9 2 Limited Liability Company Monica Martinez    Compa…       1               0
10 2 Limited Liability Company Teresa Collins     Benef…       1               1
# ℹ 24,026 more rows

4.2 Exploring the nodes data frame

In the code chunk below, skim() of skimr package is used to display the summary statistics of mc3_nodes tibble data frame.

skim(mc3_nodes)
Data summary
Name mc3_nodes
Number of rows 27622
Number of columns 5
_______________________
Column type frequency:
character 4
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
id 0 1 6 64 0 22929 0
country 0 1 2 15 0 100 0
type 0 1 7 16 0 3 0
product_services 0 1 4 1737 0 3244 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
revenue_omu 21515 0.22 1822155 18184433 3652.23 7676.36 16210.68 48327.66 310612303 ▇▁▁▁▁

There are a large number of missing values in the revenue_omu column.

In the code chunk below, datatable() of DT package is used to display mc3_nodes tibble data frame as an interactive table on the html document.

DT::datatable(mc3_nodes)
Note

Observing the nodes datatable above, we will notice that some of the node ids are not unique, some may have more than 1 country, offer more than 1 product services and/or more than 1 revenue reflected. This could be one of the way to infer the size of the company; if it operates in more than 1 country and/or offer many products, its likely that they are a big company.

Handling of missing and/or unknown values

Notice that the product services column contains NA or character(0) values, which are meaningless, thus replace it with “unknown”. As for revenue_omu column that has NA values, replace it with the value “0”.

Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

mc3_nodes <- mc3_nodes %>%
  mutate(product_services = ifelse(product_services == "character(0)", "unknown", product_services),
         revenue_omu = ifelse(revenue_omu == "" | is.na(revenue_omu), "0", revenue_omu))

Checking for duplicate nodes and removing them

Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

# Calculate the number of duplicates in mc3_nodes
num_duplicates_nodes <- sum(duplicated(mc3_nodes))

# Display the number of duplicates
#num_duplicates_nodes

# Remove duplicates from mc3_nodes
mc3_nodes_unique <- distinct(mc3_nodes)

There are a total of 2595 duplicated nodes. These duplicated nodes are removed and a new nodes dataframe, mc_nodes_uniquedataframe is created.

4.2.1 Distribution of the type of nodes

Below is the code chunk to plot the distribution of the nodes type.

Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

hist_type_node <- ggplot(data = mc3_nodes_unique,
       aes(x = type)) +
  geom_bar()+
  geom_text(stat = 'count', aes(label = ..count..), vjust = -0.1) +
  labs(title = "Distribution of Node Type", x = "Type", y = "Count") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold")) 
  
#hist_type_node

4.2.2 Distribution of the product_services

In this section, we will perform text sensing using appropriate functions of tidytext package.

To begin, we will employ the tokenisation process. In text sensing, tokenisation is the process of breaking up a given text into units called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenisation, some characters like punctuation marks may be discarded. The tokens usually become the input for the processes like parsing and text mining.

In the code chunk below, unnest_token() of tidytext is used to split text in product_services field into words.

token_nodes <- mc3_nodes_unique %>%
  unnest_tokens(word, 
                product_services)

The two basic arguments to unnest_tokens() used here are column names. First we have the output column name that will be created as the text is unnested into it (word, in this case), and then the input column that the text comes from (product_services, in this case).

Note
  • By default, punctuation has been stripped. (Use the to_lower = FALSE argument to turn off this behavior).

  • By default, unnest_tokens() converts the tokens to lowercase, which makes them easier to compare or combine with other datasets. (Use the to_lower = FALSE argument to turn off this behavior).

Now we can visualise the words extracted by using the code chunk below.

token_nodes %>%
  count(word, sort = TRUE) %>%
  top_n(5) %>%
  mutate(word = reorder(word, n)) 
# A tibble: 5 × 2
  word         n
  <fct>    <int>
1 unknown  21009
2 and       6389
3 products  1860
4 of         881
5 as         752

The tibbel dataframe above reveals that the unique words contains some words that may not be useful to use. For instance “and” and “to”. In the word of text mining we call those words stop words. Tidytext package has a function called stop_words that can help us clean up stop words.

stopwords_removed <- token_nodes %>% 
  anti_join(stop_words)
Note

There are two processes:

  • Load the stop_words data included with tidytext. This data is simply a list of words that you may want to remove in a natural language analysis..

  • Then anti_join() of dplyr package is used to remove all stop words from the analysis..

stopwords_removed %>%
  filter(!word %in% c("unknown", "services", "related","including", "offers","range")) %>%    #filter away meaningless words 
  count(word, sort = TRUE) %>%
  top_n(20) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
      labs(x = "Count",
      y = "Unique words",
      title = "Count of unique words found in product_services field")

The below code chunk will better help us categorise our product_services for analysis into fishing related, non-fishing related and unknown.

#Create a list of fishing related words 
include_words <- c("fish", "fishing", "seafood", "seafoods","prawns","prawn", "salmon","tuna","shrimp","shrimps","crab","squid","oyster","clam","mollusks","crustaceans","roe","fillet","haddock","octopus","herring","lobsters","seabass","cephalopods","cod","shellfish","shark","chum")

#Use the grepl() function to create a logical vector indicating whether each word in mc3_nodes_unique$product_services is found in the include_words list. Store the result in a new column called category
mc3_nodes_unique$category <- ifelse(grepl(paste0("\\b", paste(include_words, collapse = "\\b|\\b"), "\\b"), 
                                         tolower(mc3_nodes_unique$product_services)),
                                   "Fishing-related",
                                   ifelse(mc3_nodes_unique$product_services == "Unknown",
                                          "Unknown",
                                          "Non-fishing related"))

Next, we could look at the distribution of the category and find the median revenue for each category. We can observed from the pie chart below that only a small percentage (4%) of companies who offer fishing-related products_services. Also, the bar chart shows that the median revenue for fishing industry is around 29,811.38 OMU.

Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

library(dplyr)
library(ggplot2)
library(ggrepel)

# Define the colors for each category
category_colors <- c("Fishing-related" = "#B4D4E7", "Non-fishing related" = "#B4E7BD", "Unknown" = "#D3D3D3")

# Set the category as a factor with desired order
category_freq <- mc3_nodes_unique %>%
  mutate(category = factor(category, levels = c("Fishing-related", "Non-fishing related", "Unknown"))) %>%
  count(category) %>%
  mutate(percentage = prop.table(n) * 100)

# Create a pie chart with labels
ggplot_cat <- ggplot(category_freq, aes(x = "", y = n, fill = category)) +
  geom_bar(width = 1, stat = "identity", color = "black") +
  coord_polar(theta = "y") +
  xlab("") +
  ylab("") +
  labs(title = "Distribution of Category") +
  theme_void() +
  theme(legend.position = "right",
        plot.title = element_text(hjust = 0.5, face = "bold")) +
  geom_label_repel(aes(label = paste0(category, "\nCount: ", n, "\n", round(percentage, 1), "%")),
                   box.padding = 0.5,
                   point.padding = 0.1,
                   segment.color = "black",
                   show.legend = FALSE,
                   label.color = "black") +
  scale_fill_manual(values = category_colors)
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

#Convert revenue_omu to numeric
mc3_nodes_unique <- mc3_nodes_unique %>% 
  mutate(revenue_omu = as.numeric(revenue_omu))

# Define the colors for each category
category_colors <- c("Fishing-related" = "#B4D4E7", "Non-fishing related" = "#B4E7BD", "Unknown" = "#D3D3D3")

# Calculate the median revenue_omu for each category
median_revenue <- mc3_nodes_unique %>%
  group_by(category) %>%
  filter(category != "Non-fishing related" | (category == "Non-fishing related" & revenue_omu != 0 & !is.na(revenue_omu))) %>%
  summarize(median_revenue_omu = median(revenue_omu, na.rm = TRUE))

# Plot the bar chart
ggplot_rev <- ggplot(median_revenue, aes(x = category, y = median_revenue_omu, fill = category)) +
  geom_col() +
  scale_fill_manual(values = category_colors) +
  xlab("Category") +
  ylab("Median Revenue (OMU)") +
  labs(title = "Median Revenue by Category") +
  theme_bw() +
  theme(plot.title = element_text(face = "bold")) +
  geom_text(aes(label = round(median_revenue_omu, 2)), vjust = -0.5)
Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

combined_plot2 <- ggplot_cat / ggplot_rev
combined_plot2

Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

## Adding the `mc3_nodes_unique` attributes, consider both beneficial owners and company contacts
filtered_mc3_edges <- mc3_edges_with_no_of_companies %>%
  filter(no_of_companies > 3)


# Create a data frame with source nodes and rename column
id4 <- filtered_mc3_edges %>%
  select(source) %>%
  rename(id = source) %>%
  mutate(type_node = "company")

# Create a data frame with target nodes and rename column
id5 <- filtered_mc3_edges %>%
  select(target, type) %>%
  rename(id = target, type_node = type)

# Combine the two data frames and remove duplicates
mc3_nodes3 <- rbind(id4, id5) %>%
  distinct() %>%
  left_join(mc3_nodes_unique,
            unmatched = "drop") %>%
  distinct()

mc3_nodes3 <- mc3_nodes3 %>%
  mutate(revenue_omu = ifelse(revenue_omu == "" | is.na(revenue_omu), "0", revenue_omu))


# Convert the revenue column to numeric (if it's not already numeric)
mc3_nodes3$revenue_omu <- as.numeric(mc3_nodes3$revenue_omu)

# Calculate the revenue threshold for the top 20% excluding non-numeric or missing values
revenue_threshold <- quantile(mc3_nodes3$revenue_omu, probs = 0.90, na.rm = TRUE)

# Filter the DataFrame to retain only the rows with revenue above the threshold
filtered_mc3_nodes <- mc3_nodes3[mc3_nodes3$revenue_omu > revenue_threshold, ]
Show the code
#| echo: false
#| fig-width: 5
#| fig-height: 6

# Create a bar chart of revenue vs ID using ggplot
bar_plot_toprev <- ggplot(filtered_mc3_nodes, aes(x = reorder(id, revenue_omu), y = revenue_omu/1000)) +
  geom_bar_interactive(aes(tooltip = paste("ID:", id,
                                           "<br>Type:", type_node,
                                           "<br>Country:", country,
                                           "<br>Revenue:", revenue_omu,
                                           "<br>Product Services:", product_services)),
                       stat = "identity", fill = "steelblue") +
  labs(x = "id", y = "Revenue_omu ('000)", title = "Top 10% ids") +
  coord_flip() +
  theme(plot.title = element_text(face = "bold"))+
  theme(axis.text.y = element_text(size = 6))

# Print the bar plot
girafe(ggobj = bar_plot_toprev,
       width_svg = 8,
  height_svg = 8*0.618)

5. Network Visualisation and Analysis

5.1 Building network model with tidygraph for Beneficial Owners

Based on our edge dataframe analysis earlier on, we found out that less than 0.5% of the beneficial owners own more than 3 companies, which calls for suspicion, thus we will further investigate, by plotting the network graph and seeing their relationship with other owners and/or companies.

Preparing edge data table

#filter those beneficial owners that has more than 3 companies
filtered_mc3_edges_owner <- mc3_edges_with_no_of_companies %>%
  filter(no_of_companies > 3, type == "Beneficial Owner")

Preparing nodes data table

Instead of using the nodes data table extracted from mc3_data, we will prepare a new nodes data table by using the source and target fields of filtered_mc3_edges_owner data table. This is necessary to ensure that the nodes in nodes data tables include all the source and target values.

Show the code
#| echo: false
#| fig-width: 3
#| fig-height: 3

# Create a data frame with source nodes and rename column
id1 <- filtered_mc3_edges_owner %>%
  select(source) %>%
  rename(id = source) %>%
  mutate(type_node = "company")

# Create a data frame with target nodes and rename column
id2 <- filtered_mc3_edges_owner %>%
  select(target, type) %>%
  rename(id = target, type_node = type)

# Combine the two data frames and remove duplicates
mc3_nodes1 <- rbind(id1, id2) %>%
  distinct() 

Tidygraph model

mc3_graph <- tbl_graph(nodes = mc3_nodes1,
                       edges = filtered_mc3_edges_owner,
                       directed = FALSE) %>%
  mutate(betweenness_centrality = centrality_betweenness(),
         closeness_centrality = centrality_closeness())
Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4

# Preparing edges tibble data frame
edges_df <- mc3_graph %>%
  activate(edges) %>%
  as.tibble()


# Preparing nodes tibble data frame
nodes_df <- mc3_graph %>%
  activate(nodes) %>%
  as.tibble() %>%
  rename(label = id) %>%
  mutate(id=row_number()) %>%
  select(everything()) %>%
  relocate(id, .before = label)

nodes_df <- nodes_df %>%
  rename(group = type_node) 


# Plot the network graph with labeled nodes using visNetwork
visNetwork(nodes_df, edges_df, main = list(text = "Network Graph of Company and Beneficial Owner",
                                           style = "color: black; font-weight: bold; text-align: center;")) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visLayout(randomSeed = 123) %>%
  addFontAwesome(name ="font-awesome") %>%
  visGroups(groupname = "company", shape = "icon",
            icon = list(code = "f0f7", color = "#000000")) %>%
  visGroups(groupname = "Beneficial Owner", shape = "icon",
            icon = list(code = "f2bd")) %>%
  visLegend() %>%
  visOptions(
    highlightNearest = TRUE,
    nodesIdSelection = TRUE,
  ) %>%
  visInteraction(
    zoomView = TRUE,
    dragNodes = TRUE,
    dragView = TRUE,
    navigationButtons = TRUE,
    selectable = TRUE,  # Enable node selection
    hover = TRUE,  # Enable hover effects
  )
Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4

# Set a seed for reproducibility
set.seed(123)

ggraph_own <- mc3_graph %>%
ggraph(layout = "fr") +
  geom_edge_link(aes(alpha=0.5)) +
  geom_node_point(aes(
    size = betweenness_centrality,
    colors = "lightblue",
    alpha = 0.5)) +
  scale_size_continuous(range=c(1,10))+
  theme_graph()
ggraph_own

Top 10 Owners

filtered_mc3_edges_owner
# A tibble: 313 × 5
   source                           target         type  weights no_of_companies
   <chr>                            <chr>          <chr>   <int>           <dbl>
 1 Acevedo, Dickson and Gonzalez    Richard Smith  Bene…       1               6
 2 Adams Group                      John Smith     Bene…       1               9
 3 Adams-Pope                       Michelle Rodr… Bene…       1               4
 4 Adriatic Catch S.A. de C.V.      David Jones    Bene…       1               6
 5 Albertine Rift  NV Family        Michael Taylor Bene…       1               4
 6 Alexander PLC                    David Jones    Bene…       1               6
 7 Alvarez Ltd                      Michael Carter Bene…       1               5
 8 Alvarez, Young and Ramos         Michael Miller Bene…       1               5
 9 Ancla del Este Ltd. Liability Co Aaron Jones    Bene…       1               4
10 Ancla del Este Sp Fish           John Jones     Bene…       1               4
# ℹ 303 more rows
top_10_ids <- filtered_mc3_edges_owner %>%
  select(target, no_of_companies) %>%
  distinct() %>%
  arrange(desc(no_of_companies)) 

DT::datatable(top_10_ids)

John Smith and Michael Johnson for instance are the beneficial owners for 8 companies.

5.2 Building network model with tidygraph for Company Contacts

Similarly, to plot the network graph of Company and Company Contacts, we do the same as above,

Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4

#Filter the type = "Company Contacts" to create the edge data table
mc3_edges_cc<- mc3_edges_with_no_of_companies %>%
  filter(no_of_companies > 3, type == "Company Contacts") 

# Create the nodes data table
# Create a data frame with source nodes and rename column
id3 <- mc3_edges_cc %>%
  select(source) %>%
  rename(id = source) %>%
  mutate(type_node = "company")

# Create a data frame with target nodes and rename column
id4 <- mc3_edges_cc %>%
  select(target, type) %>%
  rename(id = target, type_node = type)

# Combine the two data frames and remove duplicates
mc3_nodes2 <- rbind(id3, id4) %>%
  distinct()


#Building the tidygraph model for company contacts
mc3_graph2 <- tbl_graph(nodes = mc3_nodes2,
                       edges = mc3_edges_cc,
                       directed = FALSE) %>%
  mutate(betweenness_centrality = centrality_betweenness(),
         closeness_centrality = centrality_closeness())
Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4

# Preparing edges tibble data frame
edges_df_2 <- mc3_graph2 %>%
  activate(edges) %>%
  as.tibble()

# Preparing nodes tibble data frame
nodes_df_2 <- mc3_graph2 %>%
  activate(nodes) %>%
  as.tibble() %>%
  rename(label = id) %>%
  mutate(id=row_number()) %>%
  select(everything()) %>%
  relocate(id, .before = label)

nodes_df_2 <- nodes_df_2 %>%
  rename(group = type_node) 

# Plot the network graph with labeled nodes using visNetwork
visNetwork(nodes_df_2, edges_df_2, main = list(text = "Network Graph of Company and Company Contacts",
                                           style = "color: black; font-weight: bold; text-align: center;")) %>%
  visIgraphLayout(layout = "layout_with_fr") %>%
  visLayout(randomSeed = 123) %>%
  addFontAwesome(name ="font-awesome") %>%
  visGroups(groupname = "company", shape = "icon",
            icon = list(code = "f0f7", color = "#000000")) %>%
  visGroups(groupname = "Company Contacts", shape = "icon",
            icon = list(code = "f0c0")) %>%
  visOptions(
    highlightNearest = TRUE,
    nodesIdSelection = TRUE,
  ) %>%
  visLegend() %>%
  visInteraction(
    zoomView = TRUE,
    dragNodes = TRUE,
    dragView = TRUE,
    navigationButtons = TRUE,
    selectable = TRUE,  # Enable node selection
    hover = TRUE,  # Enable hover effects
  )
Show the code
#| echo: false
#| fig-width: 4
#| fig-height: 4

# Set a seed for reproducibility
set.seed(123)

mc3_graph2 %>%
ggraph(layout = "fr") +
  geom_edge_link(aes(alpha=0.5)) +
  geom_node_point(aes(
    size = betweenness_centrality,
    colors = "lightblue",
    alpha = 0.5)) +
  scale_size_continuous(range=c(1,10))+
  theme_graph()